Molecular Dynamics and MM-PBSA Analysis of the SARS-CoV-2 Gamma Variant in Complex with the hACE-2 Receptor

The SARS-CoV-2 virus, since its appearance in 2019, has caused millions of cases and deaths. To date, there is no effective treatment or a vaccine that is fully protective. Despite the efforts made by governments and health institutions around the globe to control its propagation, the evolution of the virus has accelerated, diverging into hundreds of variants. However, not all of them are variants of concern (VoC’s). VoC’s have appeared in different regions and throughout the two years of the pandemic they have spread around the world. Specifically, in South America, the gamma variant (previously known as P.1) appeared in early 2021, bringing with it a second wave of infections. This variant contains the N501Y, E484K and K417T mutations in the receptor binding domain (RBD) of the spike protein. Although these mutations have been described experimentally, there is still no clarity regarding their role in the stabilization of the complex with the human angiotensin converting enzyme 2 (hACE-2) receptor. In this article we dissect the influence of mutations on the interaction with the hACE-2 receptor using molecular dynamics and estimations of binding affinity through a screened version of the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) and interaction entropy. Our results indicate that mutations E484K and K417T compensate each other in terms of binding affinity, while the mutation N501Y promotes a more convoluted effect. This effect consists in the adoption of a cis configuration in the backbone of residue Y495 within the RBD, which in turn promotes polar interactions with the hACE-2 receptor. These results not only correlate with experimental observations and complement previous knowledge, but also expose new features associated with the specific contribution of concerned mutations. Additionally, we propose a recipe to assess the residue-specific contribution to the interaction entropy.


Introduction
The pandemic caused by the SARS-CoV-2 virus has hit the world's population with the loss of millions of human lives. As of March 2022, over 452 million cases have been confirmed around the world, including over 6 million deaths due to severe acute respiratory syndrome [1]. Despite the efforts of local governments to vaccinate the entire population, the appearance of new variants of SARS-CoV-2 has been associated with an increase in infections (in some cases reinfections) and deaths in recent months.
Viruses require cellular machinery to multiply and infect surrounding cells. Viral replication occurs at high speed, allowing errors or mutations to accumulate between spike protein receptor-binding domain in complex with COVOX-222 and EY6A Fabs (PDB ID: 7NXB) [19] were obtained from Protein Data Bank [22].
The gamma variant spike protein receptor-binding domain was separated from the antibodies (7NXB complex). In both structures, water molecules, hydrogens, and glycosylation were removed. The known ionic cofactors Zinc and Chlorine were maintained. The gamma variant complex was constructed through a C-α structural alignment between the wild type complex and the gamma variant spike protein receptor-binding domain (RMSD = 0.44 Å) using PyMOL v.2.0 [23].
The preparation of the complexes for simulation by molecular dynamics were done with the GROMACS v.2020.1 [24] employing the Amber99sb-ildn [25]. The simulated protein complexes were placed in a triclinic box with length sizes extending 11 Å from the extremes of the molecule in each dimension. A TIP3P water molecule model was used to solvate the system. Chloride and sodium ions were added to reach a 0.150 M of NaCl, together with a small surplus of ions, to neutralize the system's electrostatic charge. All the MD simulations were carried out considering periodic boundary conditions. The systems were energy minimized for 50,000 steps using the steepest descent algorithm. The temperature was set at 300 K and employed the Berendsen thermostat during 1ns of simulation, while position restrains of 1000 kJ mol −1 nm −2 were applied on heavy atoms of the protein. Subsequently, the pressure was equilibrated at 1 bar aided with the Berendsen barostat during 1ns while position restrains of 1000 kJ mol −1 nm −2 were applied on heavy atoms of the protein's main chain. A final equilibration step of 1 ns was performed without applying position restrains. During this period the temperature and pressure were maintained using the Verlet thermostat and Parrinello-Rahman barostat, at 300 K and 1 bar, respectively. The production run, performed under the previously equilibrated conditions, consisted of 100 ns. During the equilibration and production processes an integration time step of 2 fs was employed. Energies and compressed coordinates were saved every 100 ps.

Statistical Analysis
From the trajectories, the root-mean-square deviation (RMSD), root-mean-square fluctuation (RMSF) and hydrogen bonds were analyzed. RMSD and RMSF plots were constructed for each hACE-2 and Spike protein (wild type and gamma variant) separately. The hydrogen bond, RMSD and RMSF analyses, together with molecular geometric features, were extracted from the trajectories employing the Gromacs toolbox. The trajectories were visualized and analyzed with VMD v.1.9.4a51 and PyMOL [23,26].

Molecular Mechanics Poison-Boltzmann Surface Area (MM-PBSA)
The MM-PBSA analysis was performed with g_mmpbsa program to calculate the binding affinity and the energy contributions of the residues to the protein-protein interaction [27,28]. The calculation procedure of energy terms consisted of three steps: calculation of the potential energy in vacuum, calculation of the polar solvation energy, and the calculation of the non-polar solvation energy. The electrostatic energy was computed with the addition of an exponential damping factor according to the Debye-Huckel theory [29]. This procedure in known as screened MM-PBSA. The average binding energy and the summary of the energy terms were obtained with the script MmPbSaStat.py [27]. In addition, the average of the energy contribution of each residue was obtained by using the script MmPbSaDecomp.py [27]. To solve the linear approximation of the Poisson-Boltzmann (PB) equation, the dielectric constants for solute and solvent were set to 2 and 80, respectively. An ionic strength was set to 150 mM. The apolar contribution was computed using a surface area approximation with γ equal to 0.0226778 kJmol −1 Å −2 and b (fitting parameter) equal to 3.84982 kJmol −1 . A total of 100 frames, from each simulated complex, were employed.

Entropy Calculations
The entropy calculations were performed employing the interaction entropy method developed by Zhang et al. [30,31]. This methodology provides a rigorous theoretical framework to compute the "gas-phase" component of the entropy change departing from fluctuations around the ensemble average of the interaction energy ∆E int : − T∆S gas = kTln e ∆E int /kT (1) with: where T and k correspond to temperature and Boltzmann constant, respectively. Brackets represents ensemble average over N simulation frames. Consequently, the term ∆E n int captures the deviation in the protein-protein interaction energy of frame n with respect to ensemble average. This methodology has been successfully applied to achieve a better characterization of the binding affinities of protein-ligands and protein-protein complexes [32][33][34]. The relative entropy contribution of residue k to the interaction entropy is computed as: where ∆S gas , k is obtained from: with R ni representing the interaction entropy contribution of the residue i in the frame n. Accordantly, S and N corresponds to the total number of residues and simulation frames, respectively. Derivation of the Equation (4) is provided in the Appendix A. ∆S k captures the entropy change those results from ignoring the contribution of residue k to the total interaction entropy term. We propose the use of residue relative entropy contribution, Equation (3), as a tool to identify the key residues involved in the interaction entropy assessment. It is important to emphasize that the validity of residue relative entropy contribution, proposed here, is limited within the theoretical foundations and restrictions that allows the derivation of Equations (1) and (2) [30,31].
In the present work, the interaction energy contributions were obtained from gmmpbsa software [27]. Eight hundred frames, from the period 20-100 ns, were considered for the interaction entropy computation. The following expression was used to compare the residue relative entropy contributions of residue k between the wild type and gamma complexes:

Molecular Dynamics Analysis
Molecular dynamic simulations were performed to characterize the structural features that determine the complex between hACE-2 and the spike protein. After 100 ns of simulation, the stability of both complexes, as a dimer, stabilizes at 15 ns, reaching a value between 2.5 and 3.5 Å ( Figure 1A). For the hACE-2 receptor in both complexes, RMSD stabilizes at 15 ns to a value around the 2 Å ( Figure 1B). Similar behavior is observed in the RMSD curves for the corresponding simulations of the spike protein ( Figure 1C). Histograms were computed from each of the RMSD curves, discarding the first 20 ns, to confirm the presence of a gaussian-like distribution in the associated structural fluctuations ( Figure 1). These results support the notion that after the first 20 ns the simulated systems reached an equilibrated state. Accordantly, all subsequent analyses were performed without considering the first 20 ns. Taken together, these results indicate the absence of large structural rearrangements and advocate for the formation of a stable complex. Figure 1. RMSD throughout 100 ns of MD simulation. Alpha carbon RMSD computed over the whole complex (A), hACE-2 receptor (B) and RBD of spike protein (C) from the wild type (blue) and gamma variant (red) simulated complexes. Bold continuous lines represent the weighted moving average over the original RMSD data (standard gaussian kernel with 30 frame window). The associated colored shadows correspond to the original RMSD data. RMSD histograms presented on the right end of each plot collect information starting at 20 ns.
A RMSF analysis was performed to gain further insights on the observed differential behavior at the hACE-2 receptor in the different complexes. Fluctuations in the hACE-2 receptor were more pronounced in the wild type complex than in its variant counterpart ( Figure 2A). These differences are particularly notable at the regions comprised by segments L73-Y83, V93-K114, N134-Q139, N290-D299 and D335-Q340 ( Figure 2B). In the spike protein of the gamma variant complex, fluctuations were smaller in residues A363-S375 ( Figure 2B,C). Interestingly, most of these segments, with L73-Y83 being the exception, are not located at the complex binding interface. Thereby, the absence of major structural differences between wild type and gamma variant complexes suggests that differences observed experimentally should arise from the associated specific intermolecular interactions. Figure 2. RMSF of bound proteins. Alpha carbon RMSF computed over the whole hACE-2 receptor (A) and RBD of spike protein (C) from the wild type (blue) and gamma variant (red) simulated complexes. Bold continuous lines represent weighted moving average over the original RMSF data (standard gaussian kernel with 5 residue window). The associated colored shadows correspond to the original RMSF data. The colored bars on the top each panel highlight regions with differential behavior between the complexes. These segments are mapped on the structure with the same color code (B). Residues bearing the key mutations of the RDB are shown in black sticks.

Effect of Mutations on the hACE-RBD Interaction
In the complex with the wild type protein, the formation of a salt bridge between K417, from the spike protein RBD, and the oxygens of the D30, from hACE2, was observed throughout the 80 ns ( Figure 3A,B). The geometric features the characterize the K417 and D30 interaction are in agreement with the patterns observed in crystallographic data surveys [35]. Considering this geometric description, the salt bridge formed between K147 and D30 shows a prevalence of 82% of the analyzed time. This interaction is absent in the gamma variant complex, since the distance between the oxygens of the carboxylic acid of D30 and hydroxyl group of T417 is around 8 Å ( Figure 3C). Consequently, these results, in agreement with previous studies [34], reinforce the idea that the K417T mutation should, on its own, imprint a negative effect on the complex binding affinity. The N501Y mutation first appeared in the English variant, called the alpha variant, and resulted in an increased ability to evade humoral immunity [9]. In the literature, the effect of this mutation has been reported as an increment in the affinity for the receptor, therefore in a greater transmissibility of the virus [36]. In the wild type complex, the side chain N501 establishes hydrogen bonds with the side chains of residues Y41, K353 of the hACE2 receptor ( Figure 4A). Nevertheless, its main polar interaction is established with the side chain of Q498; which is observed 98% of the time, within the spike protein RBD. Interestingly, these polar interactions are lost with the mutation N501Y. Considering that, with the incorporation of Y501, the binding partners of N501 may relocate their polar groups, it was decided to analyze the hydrogen bonds that the side chain of residues Y41 and K353 from hACE2 and Q498 from the spike protein establish in the gamma variant ( Figure 4A). While the polar interaction of Y41 is basically unaltered upon the mutation at position 501 of the spike protein, K353 and Q498 suffered dramatic changes. Actually, the hydrogen bond interaction between K353 and Q498, with 87% prevalence in the wild type complex, is completely abolished with the mutation N501Y. Similarly, K353's side chain interaction with D38 of hACE2 and G496 of the RBD are also drastically affected. In compensation, in the gamma complex, K353 interacts with the carbonyl oxygen of Y495 with a prevalence of 54%. In the case of Q498, its side chain polar interactions are practically abolished. For this important polar partner in the wild type complex, only low frequency interactions with residues V445, G446 and T500, within the RBD, and Q42 of the hACE2 receptor are observed in the gamma variant complex. Overall, in the gamma variant complex, at the immediate surroundings of position 501, not only are fewer hydrogen being formed, but they are also doing so with less frequency than those in the wild type complex ( Figure 4A). Nevertheless, Y501 inserts itself into a pocket where it establishes a π-π stacking interaction with Y41 of hACE-2; an interaction that it is absent in the wild type complex ( Figure 4B). This stacking interaction compensates, to some extent, for the lost hydrogen bonding partners that renders the mutation N501Y. Thereby, the gamma variant remains anchored strongly with hACE-2 despite not maintaining the hydrogen bonding network throughout the simulation. Some of rearrangements described above, associated with N501Y, have been previously reported in the context of other variants and in the gamma variant itself [34,[37][38][39][40]. It is accepted that this mutation, on its own, renders an increment in the binding affinity. This is explained as an energetic compensation of contributions from the hydrogen bond formed between N501 and K353, in the wild type, and that of the π-π interaction established between Y501 and Y41, in the variant complexes, along with a set of local structural rearrangements. However, within this established narrative, it has not been discussed that the hydrogen bond formed between the K353 and G496 in the wild type complex is substituted, in the variants, by the one established between the NH3+ group of K353 and the carbonyl oxygen of Y495 ( Figure 4A). In order to this interaction to occur, the main chain of Y495 is required to adopt a cis configuration ( Figure 5A). This main chain orientation is also observed in X-ray crystallographic data of several spike-hACE2 complexes ( Figure 5B, Supplementary Materials Table S1) [21,41,42]. Interestingly, and in accordance with our simulations, the cis orientation is present in variants bearing the mutation N501Y; namely alpha, beta, gamma and omicron. Furthermore, the position 495 lays within the Loop 2 region in a more general context of coronaviruses [43]. Remarkably, this Loop 2 region was found to be essential in the recognition and binding of S proteins to hACE2. Taken together, these results establish the hypothesis that the main chain cis configuration of Y495 is promoted by the presence of tyrosine at the position 501 and, with it, its ability to bind to hACE2 is enhanced. This opens new routes to investigate the effects associated with the N501Y mutations that extend beyond the π-π interaction with Y41.
In the literature, it has been suggested that the E484K mutation confers to the virus the ability to evade the humoral immune system rather than imprinting a higher affinity for the receptor [19]. The region comprised between residues 470 and 490, bearing the E484K of the spike protein, is positioned near to a predominantly negatively charged region of hACE-2. In this context, the E484, in the wild type complex, faces an electrostatic opposition from residues E35 and E75 ( Figure 6A). In contrast, the K484 of the gamma variant provides a better charge complementary at the interface with hACE-2 ( Figure 6B). Nevertheless, neither in the wild type complex nor in the gamma variant complex, the residue at position 484 of the spike protein established a direct short-range interaction with specific residues of hACE-2. Thus, these observations suggest that the mutation E484K endows the gamma variant with a higher affinity for hACE-2, counteracting the effect of the K417T mutation.  The results presented above provide dynamic insights on the interactions of these mutations. On the simulation, the N501Y mutation rearranges the hydrogen bond network surrounding residues without compromising its enthalpic contribution. Regarding the E484K mutation, the results suggest a long-range electrostatic stabilization rather than a residue specific interaction. In contrast, the effect of the K417T mutation is much more localized since it determines the loss of a salt bridge with D30. Binding affinity estimations were performed to confirm these effects.

Effect of Mutations on Binding Affinity
In previous studies involving the wild type protein and the alpha variant of SARS-CoV-2 as well as SARS-CoV, the binding affinity values have been reported using the molecular mechanics-generalized born surface area (MM-GBSA) and MM-PBSA techniques [34,44,45]. Here, a screened version of MM-PBSA was performed in both protein complexes to assess the impact that each of the gamma variant mutations has on the binding energy. While the wild-type complex ∆H was estimated to be −55.88 ± 22.15 kcal/mol, the gamma variant complex reached an ∆H of −63.37 ± 20.59 kcal/mol (Table 1). Consequently, ∆∆H between the two complexes is expected to be around −7.49 kcal/mol, which dominates the binding process, since the −T∆∆S corresponds to −3.16 kcal/mol. Interestingly, the changes in the entropy suggest that complex formation in the gamma variant case is less penalized than in its wild type counterpart. It is important to consider that the MM-PBSA technique and interaction entropy overestimates by two to five times the delta G with respect to the experimental value [27]. With this in mind, the estimated change of ∆∆G −10.67 kcal/mol is in relatively good alignment with the reported experimental affinities (Kd wt = 26.34 ± 1.10 nM and Kd gamma = 5.16 ± 0.04 nM) [19]. A detailed observation of the free energy terms reveals that electrostatic and polar contributions condensate the main differences between the wild and gamma variant complexes (Table 1). In both cases the gamma variant displays favorable modifications according to the MM-PBSA approximation (∆∆E elec = 32.67 kcal/mol and ∆∆G pol = −45.34 kcal/mol). Nonetheless, despite this slight gain in binding affinity, this difference is not large enough to claim an increment of the gamma variant transmissibility. A closer inspection on the specific contributions of each of the three mutations in the RBM indicate that the highest binding energy contribution variation was observed in the E484K mutation ( Table 2). The computed change corresponds to ∆∆H = −1.79 kcal/mol for K417T and ∆∆H = −3.06 kcal/mol for E484K, respectively ( Table 2). The mutation N501Y showed a slightly lower modification in its contributions to the binding affinity (∆∆H=1.17 kcal/mol). On the other hand, the specific residue contribution to the interaction entropy presented the residues S19, D30 and E35 in the hACE2 receptor as more affected by the mutations (Table 3). While the entropy changes on residues D30 and K31 can be associated with the K417T mutant, those observed at S19 cannot be attributed directly to any of the mutations. The computation of residue relative interaction contribution allows the assessment of the degree of participation of each residue in the binding affinity (Table 4, Supplementary Materials Figure S1). As expected, the ∆∆G linked to mutations K417T (∆∆G = 2.32 kcal/mol) and E484K (∆∆G = −3.32 kcal/mol) has a compensatory character. Similarly, the mutation N501Y (∆∆G = 0.52 kcal/mol) shows a low net change per se.
Taken together, while mutations K417T and E484K can be interpreted as intramolecular epistasis [46], the mutation N501Y apparently triggers the rearrangement of polar and apolar interactions. Considering the above, the mutations may have little influence on the transmissibility of the virus. Nevertheless, these changes, as reviewed in the literature, have an influence concerning the interaction with neutralizing antibodies [18,47]. Specifically, in the gamma variant, a loss of affinity due to the E484K mutation has been observed with the monoclonal antibodies REGN10933 or Casirivimab (Regeneron) and Ly-CoV555 (Lilly). This loss of affinity is a consequence of the charge repulsion that occurs between lysine and the negatively charged hypervariable region of the antibodies [48]. Corresponding to the K417T mutation, K417 residue could form a salt bridge with E99 of the LY-CoV16 (Lilly) monoclonal antibody. However, this interaction is lost when the threonine takes its place in the gamma variant, resulting in a loss of binding affinity [49]. Moreover, the neutralizing antibodies COVOX-222 and EY6A were also tested to target the regions bearing mutations at residues 501 and 417. Surprisingly, the neutralization effect was unaltered [19].
Further analysis would be needed to describe the molecular elements that governs those interactions. Regarding vaccines, these have shown efficacy against a large number of variants and today almost half of the world's population, according to the World Health Organization (WHO), is vaccinated against COVID-19 [50][51][52]. However, new variants continue to appear, and others prevail due to lack of treatment and reinfection capacity. Thus, the gamma variant is no longer circulating, although it led the spread of the virus in South America during early 2021, according to GISAID and WHO data [53,54]. Despite this, we find it important due to the presence of compensatory mutations and the possible intramolecular epistasis that revolves around the evolution of SARS-CoV-2.

Conclusions
Viruses evolve to improve transmissibility, and become more lethal or evade the immune system to prevail over time. Specifically, with COVID-19, the lethality of the virus is low but its ability to transmit and reinfect seems to be improved with the appearance of the new variants. Compensatory mutations have been commonly observed in other circumstances with other viruses [46][47][48]. This work provides further evidence, based on free energy estimations, that supports the notion that the mutations K417T and E484K are compensatory. While keeping the balance on the binding affinity of the spike protein RBD towards the hACE2 receptor, they may also confer to the virus the ability to avoid the host immune system. However, the consequences associated with N501Y are more convoluted. Several research groups, employing distinct experimental and computational techniques, have reached the conclusion that the mutation N501Y confers to the virus a higher binding affinity for the hACE2 receptor [40]. Certainly, the main modification introduced by the tyrosine is the π-π interaction with Y41. Nonetheless, this mutation also disrupts the hydrogen bond between K353 of hACE2 and Q498. In principle, these losses and gains of interactions should imprint a near zero change in binding energy associated with mutation N501Y. Therefore, other consequences, additional to those already discussed in the literature, should be associated with this mutation. Our results allow us to identify the presence of cis configuration in the backbone of Y495, which allows the hydrogen bond interaction with the NH3+ group of K353. This cis configuration correlates with the presence of Y501 in X-ray crystallographic models of several variants (including alpha, beta, gamma and omicron). These structural modifications may also be present at Y495 in other coronaviruses [43]. Further analysis is required to completely unveil the consequences associated with the N501Y mutation.
− T∆S gas = kTln e ∆E int /kT (A1) with ∆E int representing fluctuation around the ensemble average of the interaction energy between the interacting molecules. T and k correspond to temperature and Boltzmann constant, respectively. The brackets represent the ensemble average computed over N simulation frames. Consequently, the logarithm argument is computed as: The theoretical foundations and applications of Equations (A1) and (A2), have been established in the literature [30][31][32][33][34]. Thereby, here we present an expression to assess the residue contribution to the interaction entropy term. Considering that the interaction entropy deviation of the frame n is computed as: where the term R ni represents the interaction entropy contribution of the residue i in the frame n. S and N corresponds to the total number of residues and simulation frames, respectively. By exchanging the summation order in the second term the Equation (A3) can be rewritten as follows: Consequently, the substitution of Equation (A4) into Equation (A2), leads to: Thus, the computation of the gas phase entropy (Equation (A1)) involves the summation over N frames of the multiplication of S factors; each for one residue. The relative entropy contribution of the residue k to the interaction entropy is computed as: T∆S k = T∆S gas − T∆S gas,k This term captures the entropy changes that results from ignoring the contribution that residue k to the total interaction entropy term. It is important to emphasize the arithmetical equivalence of Equations (A1) and (A6). Consequently, Equations (A7) and (A8) represent an attempt to capture residue specific contributions, which certainly can only be achieved regarding the total value Equation (A1). Thereby, we propose the term "residue relative entropy contribution" to describe Equation (A8).